{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div align=\"right\"><i>Peter Norvig<br>12 August 2019<br>Update Oct 2020</i></div>\n",
    "\n",
    "# Data and Code for [Tracking Trump: Electoral Votes Edition](Electoral%20Votes.ipynb)\n",
    "\n",
    " *Morning Consult* has a **[Tracking Trump](https://morningconsult.com/tracking-trump/)** web page that\n",
    " gives state-by-state, month-by-month presidential approval poll data.  Within the web page there is some Javascript from which\n",
    " we can extract the data we need. It looks like this:\n",
    "\n",
    "     var mc_state_trend = [[\"Demographic\",\"January '17\",\"\",\"February '17\",\"\", ...]\n",
    "                           [\"Alabama\",\"62\",\"26\",\"65\",\"29\", ...], \n",
    "                           ... ]\n",
    "                           \n",
    "The first row is a header (each date is a month at which polls were aggregated).\n",
    "The subsequent rows each start with the state name, followed by the approval and disapproval percentages for each date. That is, if there are 34 dates, there will by 68 numbers. The row shown above is saying that in January, 2017, 62% of Alabamans approved and 26% disapproved; then in February, 2017, 65% approved and 29% disapproved, and so on. Our job is to extract this data and find ways to visualize and understand it.\n",
    "\n",
    "First fetch the page and save it locally:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "! [ -e evs.html ] || curl -s -o evs.html https://morningconsult.com/tracking-trump-2/"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now some imports: "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "%matplotlib inline\n",
    "import matplotlib.pyplot as plt\n",
    "import re\n",
    "import ast\n",
    "from collections import namedtuple\n",
    "from IPython.display import display, Markdown\n",
    "from statistics import stdev"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Additional data: the variable `state_data` contains the [electoral votes by state](https://www.britannica.com/topic/United-States-Electoral-College-Votes-by-State-1787124) and the [partisan lean by state](https://github.com/fivethirtyeight/data/tree/master/partisan-lean) (how much more Republican (plus) or Democratic (minus) leaning the state is compared to the country as a whole, across  recent elections). \n",
    "\n",
    "The variable `net_usa` has the [country-wide net presidential approval](https://projects.fivethirtyeight.com/trump-approval-ratings/) by month."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "# From https://github.com/fivethirtyeight/data/tree/master/partisan+lean\n",
    "# A dict of {\"state name\": (electoral_votes, partisan_lean)}\n",
    "state_data = {        \n",
    "  \"Alabama\": (9, +27),         \"Alaska\": (3, +15),          \"Arizona\": (11, +9),         \n",
    "  \"Arkansas\": (6, +24),        \"California\": (55, -24),     \"Colorado\": (9, -1),          \n",
    "  \"Connecticut\": (7, -11),     \"Delaware\": (3, -14),        \"District of Columbia\": (3, -43),\n",
    "  \"Florida\": (29, +5),         \"Georgia\": (16, +12),        \"Hawaii\": (4, -36),           \n",
    "  \"Idaho\": (4, +35),           \"Illinois\": (20, -13),       \"Indiana\": (11, +18),        \n",
    "  \"Iowa\": (6, +6),             \"Kansas\": (6, +23),          \"Kentucky\": (8, +23),        \n",
    "  \"Louisiana\": (8, +17),       \"Maine\": (4, -5),            \"Maryland\": (10, -23),        \n",
    "  \"Massachusetts\": (11, -29),  \"Michigan\": (16, -1),        \"Minnesota\": (10, -2),        \n",
    "  \"Mississippi\": (6, +15),     \"Missouri\": (10, +19),       \"Montana\": (3, +18),         \n",
    "  \"Nebraska\": (5, +24),        \"Nevada\": (6, +1),           \"New Hampshire\": (4, +2),    \n",
    "  \"New Jersey\": (14, -13),     \"New Mexico\": (5, -7),       \"New York\": (29, -22),        \n",
    "  \"North Carolina\": (15, +5),  \"North Dakota\": (3, +33),    \"Ohio\": (18, +7),            \n",
    "  \"Oklahoma\": (7, +34),        \"Oregon\": (7, -9),           \"Pennsylvania\": (20, +1),    \n",
    "  \"Rhode Island\": (4, -26),    \"South Carolina\": (9, +17),  \"South Dakota\": (3, +31),    \n",
    "  \"Tennessee\": (11, +28),      \"Texas\": (38, +17),          \"Utah\": (6, +31),            \n",
    "  \"Vermont\": (3, -24),         \"Virginia\": (13, 0),         \"Washington\": (12, -12),      \n",
    "  \"West Virginia\": (5, +30),   \"Wisconsin\": (10, +1),       \"Wyoming\": (3, +47)}\n",
    "\n",
    "# From https://projects.fivethirtyeight.com/trump-approval-ratings/\n",
    "# A dict of {'date': country-wide-net-approval}, taken from 1st of month.\n",
    "net_usa = {\n",
    "  'Jan-17':  10, 'Jan-18': -18, 'Jan-19': -12, 'Jan-20': -11,\n",
    "  'Feb-17':   0, 'Feb-18': -15, 'Feb-19': -16, 'Feb-20': -10,\n",
    "  'Mar-17':  -6, 'Mar-18': -14, 'Mar-19': -11, \n",
    "  'Apr-17': -13, 'Apr-18': -13, 'Apr-19': -11, \n",
    "  'May-17': -11, 'May-18': -12, 'May-19': -12,       \n",
    "  'Jun-17': -16, 'Jun-18': -11, 'Jun-19': -12,      \n",
    "  'Jul-17': -15, 'Jul-18': -10, 'Jul-19': -11,      \n",
    "  'Aug-17': -19, 'Aug-18': -12, 'Aug-19': -10,    \n",
    "  'Sep-17': -20, 'Sep-18': -14, 'Sep-19': -13, \n",
    "  'Oct-17': -17, 'Oct-18': -11, 'Oct-19': -13,   \n",
    "  'Nov-17': -19, 'Nov-18': -11, 'Nov-19': -13,\n",
    "  'Dec-17': -18, 'Dec-18': -10, 'Dec-19': -12,\n",
    "  }"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now the code to parse and manipulate the data:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "class State(namedtuple('_', 'name, ev, lean, approvals, disapprovals')):\n",
    "    '''A State has a name, the number of electoral votes, the partisan lean,\n",
    "    and two dicts of {date: percent}: approvals and disapprovals.'''\n",
    "\n",
    "def parse_page(filename, state_data=state_data) -> tuple:\n",
    "    \"Read data from the file and return tuple: (list of `State`s, list of dates).\"\n",
    "    text = re.findall(r'\\[\\[.*?\\]\\]', open(filename).read())[0]\n",
    "    header, *table  = ast.literal_eval(text)\n",
    "    dates = [date32(d) for d in header[1::2]]\n",
    "    states = [State(name, *state_data[name],\n",
    "                    approvals=dict(zip(dates, map(int, numbers[0::2]))),\n",
    "                    disapprovals=dict(zip(dates, map(int, numbers[1::2]))))\n",
    "              for (name, *numbers) in table]\n",
    "    return states, dates\n",
    "\n",
    "def date32(date) -> str: \n",
    "    \"\"\"Convert a date to 3-2 format: date32(\"June '17\") => 'Jun-17'.\"\"\"\n",
    "    return date[:3] + '-' + date[-2:]\n",
    "\n",
    "states, dates = parse_page('evs.html')\n",
    "now = dates[-1]\n",
    "\n",
    "def EV(states, date=now, swing=0) -> int:\n",
    "    \"Total electoral votes of states with net positive approval (plus half for net zero).\"\n",
    "    return sum(s.ev * is_positive(net(s, date) + swing) for s in states)\n",
    "\n",
    "def is_positive(x) -> int:\n",
    "    \"1 if x is positive; 0 if x is negative; 1/2 if x is zero.\"\n",
    "    return 1/2 if x == 0 else int(x > 0)\n",
    "\n",
    "def margin(states, date=now) -> int:\n",
    "    \"What's the least swing that would lead to a majority?\"\n",
    "    return min(swing for swing in range(-50, 50) if EV(states, date, swing+0.1) >= 270)\n",
    "\n",
    "def net(state, date=now)         -> int:   return state.approvals[date] - state.disapprovals[date]\n",
    "def undecided(state, date=now)   -> int:   return 100 - state.approvals[date] - state.disapprovals[date]\n",
    "def movement(state, date=now)    -> float: return undecided(state, date) / 5 + 2 * 𝝈(state)\n",
    "def 𝝈(state, recent=dates[-12:]) -> float: return stdev(net(state, d) for d in recent)\n",
    "def is_swing(state)              -> bool:  return abs(net(state)) < movement(state)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Various functions for displaying data:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "def header(head) -> str: return head + '\\n' + '-'.join('|' * head.count('|'))\n",
    "\n",
    "def markdown(fn) -> callable: return lambda *args: display(Markdown('\\n'.join(fn(*args))))\n",
    "\n",
    "def parp(state, date=now) -> int: return net(state, date) - state.lean\n",
    "\n",
    "def grid(dates, xlab, ylab): \n",
    "    plt.minorticks_on(); plt.grid(which='minor', axis='y', ls=':', alpha=0.7)\n",
    "    plt.xticks(range(len(dates)), dates, rotation=90, fontfamily='monospace')\n",
    "    plt.xlabel(xlab); plt.ylabel(ylab); plt.legend()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "def show_evs(states=states, dates=dates, swing=3):\n",
    "    \"A plot of electoral votes by month.\"\n",
    "    plt.rcParams[\"figure.figsize\"] = [10, 7]\n",
    "    plt.style.use('fivethirtyeight')\n",
    "    N = len(dates)\n",
    "    err = [[EV(states, date) - EV(states, date, -swing) for date in dates],\n",
    "           [EV(states, date, +swing) - EV(states, date) for date in dates]]\n",
    "    plt.plot(range(N), [270] * N, color='darkorange', label=\"270 EVs\", lw=2)\n",
    "    plt.errorbar(range(N), [EV(states, date) for date in dates], fmt='D-',\n",
    "                 yerr=err, ecolor='grey', capsize=7, label='Trump EVs ±3% swing', lw=2)\n",
    "    grid(dates, 'Date', 'Electoral Votes')\n",
    "    #labels('Date', 'Electoral Votes')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "def show_approval(states=states, dates=dates):\n",
    "    \"A plot of net popularity by month.\"\n",
    "    plt.rcParams[\"figure.figsize\"] = [10, 7]\n",
    "    plt.style.use('fivethirtyeight')\n",
    "    N = len(dates)\n",
    "    plt.plot(range(N), [0] * N, label='Net zero', color='darkorange')\n",
    "    plt.plot(range(N), [-margin(states, date) for date in dates], 'D-', label='Margin to 270')\n",
    "    plt.plot(range(N), [net_usa[date] for date in dates], 'go-', label='Country-wide Net')\n",
    "    grid(dates, 'Date', 'Net popularity')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "def show_swings(swings=range(10)):\n",
    "    print('Swing   EV Range')\n",
    "    for swing in swings:\n",
    "        s = swing + 0.5\n",
    "        print(f'±{s:3.1f}%  {EV(states, swing=-s):3} to {EV(states, swing=s):3}')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "@markdown\n",
    "def show_states(states=states, d=now, ref='Jan-17'):\n",
    "    \"A table of states, sorted by net approval, with electoral votes.\"\n",
    "    total = 0\n",
    "    yield header(f'|State|Net|Move|EV|ΣEV|+|−|?|𝝈|')\n",
    "    for s in sorted(states, key=net, reverse=True):\n",
    "        total += s.ev\n",
    "        b = '**' * is_swing(s)\n",
    "        yield (f'|{swing_name(s)}|{b}{net(s, d):+d}%{b}|{b}±{movement(s):.0f}%{b}|{s.ev}|{total}'\n",
    "               f'|{s.approvals[d]}%|{s.disapprovals[d]}%|{undecided(s, now)}%|±{𝝈(s):3.1f}%')\n",
    "        \n",
    "def swing_name(s) -> str: return ('**' + s.name.upper() + '**') if is_swing(s) else s.name"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [],
   "source": [
    "@markdown\n",
    "def show_parp(states=states, dates=(now, 'Jan-19', 'Jan-18', 'Jan-17')):\n",
    "    \"A table of states, sorted by Popularity Above Replacement President.\"\n",
    "    def year(date): return '' if date == now else \"'\" + date[-2:]\n",
    "    fields = [f\"PARP{year(date)}|(Net)\" for date in dates]\n",
    "    yield header(f'|State|Lean|EV|{\"|\".join(fields)}|')\n",
    "    for s in sorted(states, key=parp, reverse=True):\n",
    "        fields = [f'{parp(s, date):+d}|({net(s, date):+d})' for date in dates]\n",
    "        yield f'|{swing_name(s)}|{s.lean:+d}|{s.ev}|{\"|\".join(fields)}|'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Show how to make all Electoral Vote totals:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [],
   "source": [
    "def all_totals(states) -> dict:\n",
    "    \"\"\"Return a dict of {ev_total: [state, ...]}\"\"\"\n",
    "    totals = {0: []}\n",
    "    for s in sorted(states, key=lambda s: s.ev):\n",
    "        new = {}\n",
    "        for t in totals:\n",
    "            new[t + s.ev] = shorter(totals.get(t + s.ev), totals[t] + [s.name])\n",
    "        totals = {**totals, **new}\n",
    "    return totals\n",
    "                                    \n",
    "def shorter(A, B):\n",
    "    \"\"\"Return the sequence that is shorter, but always return B if A is None.\"\"\"\n",
    "    return B if A is None or len(A) > len(B) else A\n",
    "\n",
    "def show_totals():\n",
    "    districts = [State('Maine-1', 1, 0, 0, 0), State('Maine-2', 1, 0, 0, 0)]\n",
    "    totals = all_totals(states + districts)\n",
    "    for i in range(270):\n",
    "        print(f\"{i:3} [{' + '.join(totals[i])}]\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Tests** (I really should have more):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [],
   "source": [
    "assert len(states) == 51,                \"50 states plus DC\"\n",
    "assert all(s.ev >= 3 for s in states),   \"All states have two senators and at least one rep.\"\n",
    "assert sum(s.ev for s in states) == 538, \"Total of 538 electoral votes.\""
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
